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! ABSTRACT 

We use an array of high-resolution N-body simulations to determine the mass 
Cn| \ function of dark matter haloes at redshifts 10-30. We develop a new method for com- 

pensating for the effects of finite simulation volume that allows us to find an approx- 
imation to the true "global" mass function. By simulating a wide range of volumes 
at different mass resolution, we calculate the abundance of haloes of mass 10 5-12 
/i _1 Mq. This enables us to predict accurately the abundance of the haloes that host 
the sources that reionize the universe. In particular, we focus on the small mass haloes 
( }t 10 5 5_6 /i _1 M Q ) likely to harbour population III stars where gas cools by molecular 
hydrogen emission, early galaxies in which baryons cool by atomic hydrogen emission 
\Q \ at a virial temperature of ~ 10 4 K (~ 10 7 5_8 /i~ 1 Mq), and massive galaxies that 

may be observable at redshift ~10. When we combine our data with simulations that 
include high mass haloes at low redshift, we find that the best fit to the halo mass func- 
tion depends not only on linear overdensity, as is commonly assumed in analytic mod- 
els, but also upon the slope of the linear power spectrum at the scale of the halo mass. 
The Press-Schechter model gives a poor fit to the halo mass function in the simulations 
at all epochs; the Sheth-Tormen model gives a better match, but still overpredicts the 
abundance of rare objects at all times by up to 50%. Finally, we consider the conse- 
quences of the recently released WMAP 3-year cosmological parameters. These lead 
to much less structure at high redshift, reducing the number of z = 10 "mini-haloes" 
by more than a factor of two and the number of z = 30 galaxy hosts by more than four 
orders of magnitude. Code to generate our best-fit h alo mass function may be dow n- 
loaded from |http: / /ice. dur.ac.uk/Research/PublicDownloads/genmf jeadme.h tml 

Key words: galaxies: haloes - galaxies: formation - methods: N-body simulations - 
cosmology: theory - cosmology: dark matter 
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1 INTRODUCTION 

The numbers of haloes in the high redshift universe are criti- 
cal for determining the numbers of stars and galaxies at high 
redshift, for understanding reionization, and for guiding ob- 
servational campaigns designed to search for the first stars 
and galaxies. The reionization of the universe is thought to 
be caused by some combination of metal-free stars, early 
galaxies and accreting black holes (see e.g. Bromm & Lar- 
son 2004; Ciardi & Ferrara 2005; Reed et al. 2005 and refer- 
ences therein) , all of which are expected to lie in dark matter 
haloes, the numbers of which are, to date, highly uncertain 
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at these early times. This paper presents an array of cosmo- 
logical simulations of a wide range of volumes with which 
we determine the numbers of haloes over the entire mass 
range that is expected to host luminous sources in the high 
redshift universe. 

The first galaxies are expected to form within haloes 
of sufficiently high virial temperature to allow efficient cool- 
ing by atomic hydrogen via collisionally induced emission 
processes, which become strong at temperatures of ~10 4 K, 
providing the possibility of efficient star formation. Haloes of 
mass ~ 10 8 x [10/(1 + z)' A/2 ] ft _1 M a have the required virial 
temperature to host galaxies. Haloes with virial tempera- 
ture less than the threshold for atomic hydrogen line cooling, 
but larger than ~2,000K, often referred to as "mini-haloes", 
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have the potential to host metal-free (population III) stars 
that form from gas cooled through the production of H2 
and the resulting collisionally-excited line emission. The first 
stars in the universe are expected to form within such mini- 
haloes, which have masses as small as ~ 10 5_6 /i _1 Mo at 
redshifts of ~10-50. The inability of collapsing Eb-cooled 
gas to fragment to small masses, demonstrated in pioneer- 
ing simulations by Abel, Bryan, & Norman (2000, 2002) 
and by Bromm, Coppi, & Larson 1999, 2002), suggests that 
these first stars will be very massive ( £ IOOMq), luminous, 
and short lived, and will thus have dramatic effects on their 
surroundings. These population III stars begin the process 
of enriching the universe with heavy elements, and are ex- 
pected to have an important impact (directly or indirectly) 
on reionization. 

Early estimates of the numbers of haloes in the pre- 
reionized and the reionizing universe have relied upon an- 
alytic arguments such as the Press & Schechter (1974) for- 
malism or the later Sheth & Tormen (1999; S-T) function 
(further detailed in Sheth, Mo, & Tormen 2001; Sheth & 
Tormen 2002). For haloes at very high redshifts, which form 
from rare fluctuations in the density field, these analytic 
methods are in poor agreement with each other. At lower 
redshifts, halo numbers have been extensively studied using 
N-body simulations of large volumes. Simulations by Jenk- 
ins et al. (2001) show that the mass function of dark matter 
haloes in the mass range from galaxies to clusters is reason- 
ably well described by the Sheth & Tormen (1999) analytic 
function out to redshift 5, although with some suppression at 
high masses. Jenkins et al. proposed an analytic fitting for- 
mula for the "universal" mass function in their simulations. 
Warren et al. (2006) used a suite of simulations to measure 
the redshift zero mass function to high precision. Reed et 
al. (2003) used higher resolution simulations to show that 
the broad agreement with the S-T function persists down to 
dwarf scales and to z = 10, a result that was confirmed by 
the larger "Millennium" simulation of Springel et al. (2005). 
However, at z ~ 15, Reed et al. (2003) also found fewer 
haloes than predicted by the S-T function. Qualitatively 
consistent results have been found by Iliev et al. (2006), 
Zahn et al. (2006). These studies indicate that current an- 
alytic predictions of halo numbers are inaccurate at high 
redshift and demonstrate the need for N-body studies to 
determine the mass function at earlier times. 

Early attempts to simulate the formation of dark mat- 
ter haloes in the young universe suffered from effects re- 
sulting from the finite box sizes of the simulations, as 
noted by (e.g. White & Springel 2000; Barkana & Loeb 
2004). Recently Schneider et al. (2006) have modelled haloes 
large enough to host galaxies using PINOCCHIO (Monaco et 
al. 2002; Monaco, Theuns, & Taffoni 2002), a code that pre- 
dicts mass merger histories given a linear density fluctuation 
field. More directly, Heitmann et al. (2006) used N-body sim- 
ulations to show that haloes large enough to cool via atomic 
hydrogen transitions, and thus with the potential to host 
galaxies at redshifts 10 - 20, are well fit by the Warren et 
al. mass function, with the largest haloes suppressed relative 
to the S-T function by an amount consistent with that seen 
in Reed et al. (2003). 

Formation of the first haloes large enough to host galax- 
ies occurs as early as z ~ 35 (e.g. Gao et al. 2005; Reed 
et al. 2005), much earlier than the epochs at which the 



mass function has been calculated directly. The abundance 
of smaller haloes, capable of hosting population III stars, 
but too cool for atomic cooling, remains poorly constrained 
by numerical simulations; see however early work by Jang- 
Condell & Hernquist (2001). The major difficulty is the com- 
putational challenge of performing simulations with very 
high mass resolution within a volume that is large enough to 
sample fully the cosmological mass perturbation spectrum. 
At high redshifts, the effects of finite box size become par- 
ticularly important because the haloes to be sampled rep- 
resent rare fluctuations in the linear fluctuation spectrum. 
Since the mass function is steep, the numbers of such rare 
haloes are particularly sensitive to large scale, low ampli- 
tude density fluctuations. Finite box size effects worsen as 
one attempts to simulate the smaller volumes needed to re- 
solve lower mass haloes because fluctuations on the scale 
of the box become comparable to those on the scale of the 
halo. In ACDM cosmologies with spectral slope parameter 
n s = 1, the effective, local spectral index of the perturba- 
tion spectrum, n e ff, approaches —3 on the smallest scales, 
implying that fluctuations on a broad range of scales have 
similar amplitude (see further discussion in § 2) . As a re- 
sult, proper modelling of the power on scales much larger 
than the scale of the halo is important. Simulations of small 
haloes must therefore have a large dynamic range in order 
to model accurately all of the fluctuations that determine 
the formation and evolution of a halo. 

Several authors have estimated the effect of the finite 
simulation box size on the halo mass function using tech- 
niques based on assuming a simple cutoff in the power spec- 
trum of density fluctuations on scales larger than the box 
length (Barkana & Loeb 2004; Bagla & Ray 2005; Power & 
Knebe 2006; Bagla & Prasad 2006). While these techniques 
are able to account for the missing large-scale power, they 
do not account for cosmic variance, i.e. the run-to-run vari- 
ations introduced by the finite sampling of density modes, 
particularly at scales near the box size (e.g. Sirko 2005). 
Since the density field is derived from a set a discrete Fourier 
modes with maximum wavelength equal to the box size, the 
power at the largest wavelengths is determined by only a 
small number of realised modes. As a result, each random 
realisation of a simulation volume produces different large- 
scale structures. 

We introduce a technique, described in Section 2, 
which deals with the finite volume effects through a mass- 
conserving transformation of the halo mass function esti- 
mated from each individual simulation output. In order to 
verify the ability of this technique to account for finite vol- 
ume affects, we perform simulations of a wide, but closely 
spaced range of volumes, which results in large overlaps in 
redshift and in the range of resolved halo masses in different 
simulations. The agreement of the inferred mass functions 
in the regimes where halo masses overlap allows us to ver- 
ify the finite volume correction, and also allows us to rule 
out resolution dependencies of our results. Multiple realisa- 
tions of a single volume at identical resolution then test how 
well the correction to the inferred mass function is able to 
minimise the effects of cosmic variance. 

Our simulations are designed to extend the mass func- 
tion to small masses and high redshifts, covering a mass 
range of 10 5 to 10 12 /i _1 M Q , at redshifts 10 to 30, and we 
supplement them with low redshift data taken from other 
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studies. This extends the mass function down to masses 
small enough to include the "mini-haloes" capable only of 
hosting stars formed via H2-cooling, and determines more 
precisely the mass function of larger haloes which can host 
galaxies. In § 2, we define the halo mass function and out- 
line our method for dealing with finite volume effects. In 
§ 3, we discuss our suite of simulations of varying box sizes 
and resolutions. In § 4, we demonstrate the effectiveness of 
our techniques for correcting for finite volume and cosmic 
variance. We then present our mass function and compare 
it to previous works. In § 5, we consider the dependence of 
the mass function on cosmological parameters in the light of 
the recent WMAP third year results (Spergel et al. 2006). 
In § 6, we discuss some implications of our mass function for 
astrophysical models that rely on the mass function of high 
redshift haloes. Finally, our conclusions are summarised in 
§ 7. 

Except when otherwise indicated, we assume through- 
out a flat ACDM model with the following cosmological pa- 
rameters, which are consistent with the combined first year 
WMAP /2dFGRS results (Spergel et al. 2003): matter den- 
sity, Q m = 0.25; dark energy density, £Ia = 0.75; baryon 
density, fibaryon = 0.045; fluctuation amplitude, erg = 0.9; 
Hubble constant h = 0.73 (in units of 100 km s -1 Mpc -1 ); 
and no tilt (i.e. a primordial spectral index of 1). Note that 
our results should, in principle, be scalable to other values 
of cosmological parameters. 



2 THE HALO MASS FUNCTION 

In this section we define the notation that we use to describe 
the halo mass function and introduce our method for esti- 
mating the halo mass function from our N-body simulations. 
The simulations themselves are described in 5 3. 



2.1 Definitions 

The differential halo mass function, or halo mass function 
for short, dn/dM, is defined as the number of haloes of mass 
M per unit volume per unit interval in M. In this section, 
we introduce, for reasons that will become apparent later, 
an alternate pair of variables, f(a) and In a -1 , to describe 
the halo mass function. The quantity a is the RMS linear 
overdensity of the density field smoothed with a top-hat fil- 
ter with a radius that encloses a mass M at the mean cosmic 
matter density. For an infinite volume we have: 



a 2 (M) = 



b 2 (z) 
2tt 2 



k 2 P(k)W 2 {k;M)dk, 



(1) 



where P(k) is the linear power spectrum of the density fluc- 
tuations at z = 0, W(k; M) is the Fourier transform of the 
real-space top-hat filter, and b(z) is the growth factor of 
linear perturbations normalised to unity at z = (Peebles 
1993). The quantity In a~ can be thought of as a mass vari- 
able in the sense that higher values of In cr" 1 correspond to 
higher masses for a given redshift and matter power spec- 
trum. 

The quantity, f(a), to which we will refer as the mass 
function, is defined as the fraction of mass in collapsed haloes 
per unit interval in lncr - . If all matter is in haloes of some 
mass then: 



/ dlno- -1 = 1. 

The differential halo mass function is related to f(a) 
po diner -1 



(2) 



by: 



dn 
dM 



•fV) 



(3) 



M dM 

where po is the mean mass density of the universe. 

The function f(a) will depend on how haloes are de- 
fined. For this paper, in common with most recent work on 
halo mass functions, we use the friends-of-friends (FOF) al- 
gorithm (Davis et al. 1985) with a linking length of 0.2 times 
the mean inter-particle separation. In the appendix we in- 
clude mass functions using the SO algorithm (Lacey & Cole 
1994). 

The reasons for choosing to describe the halo mass func- 
tion in terms of the rather abstract variables / and In a^ 1 
are twofold. Firstly, the most commonly used analytical halo 
mass functions can be expressed compactly in terms of these 
variables. For example, the Press & Schechter (1974; P-S) 
mass function can be expressed as: 



/p-s(a) 



2 5 C 



■ exp 



A. 

2a 2 



(4) 



where the parameter S c = 1.686 can be interpreted physi- 
cally as the linearly extrapolated overdensity of a top-hat 
spherical density perturbation at the moment of maximum 
compression for an Einstein de-Sitter universe (Q m = 1). 
The evolution of S c , predicted by the spherical collapse 
model (e.g. Eke, Cole & Frenk 1996) as fi m ,z transitions 
from ~ 1 at high redshift to its present value, is sufficiently 
weak that we have ignored it in our treatment. Similarly, 
the Sheth-Tormen (S-T) mass function takes the form: 



/s-t(c) 



1 + 




s c 


a8 2 ~ 




— exp 
a 


2a 2 



(5) 



The choice of values A = 0.3222, a = 0.707 and p = 0.3 pro- 
vides a significantly better fit to mass functions determined 
from numerical simulations over a wide range of masses and 
redshifts than the P-S formula. 

Secondly, it has been found empirically, consistently 
with the analytic mass function formulae above, that the 
FOF mass function determined from cosmological simula- 
tions for a wide range of redshifts, and for a wide range of 
cosmological models can be fitted accurately by a unique 
function f(a) (e.g. S-T 1999, Jenkins et al. 2001, Reed 
et al. 2003, Linder & Jenkins 2003, Lokas, Bode & Hoff- 
mann 2004, Warren et al. 2006). A number of formulae for 
/(<t) have been proposed based on fits to simulation data 
and these are generally consistent at the ~ 10-30% with the 
largest differences occuring at the high mass end, where the 
rarity and steepness of the halo mass function make its es- 
timation rather challenging. The main aim of this paper is 
to determine the halo mass function at high redshift and to 
provide a fitting formula which applies to both our high and 
low redshift simulation data. 

It is appropriate here to question whether the halo mass 
function can really be expressed as a universal function of 
the form f(a) (see further discussion in S-T 1999). Structure 
formation in Einstein-deSitter cosmological N-body simula- 
tions in which the initial power spectrum is a (truncated) 
power- law (P(k) oc k n ) has been found to show self-similar 
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evolution (e.g. Efstathiou et al. 1998, Lacey & Cole 1994). 
For a given value of n, self-similar evolution implies a univer- 
sal form for /(a). However the function / could, in principle, 
be different for different values of the power-law index, n. 
There is suggestive evidence that this may indeed be the 
case in Fig. 1 of Lacey & Cole (1994). 

Thus, while empirically the CDM halo mass function 
appears to be well described by a function /(er), it may 
be that it is possible to improve the accuracy of a fitting 
formula by adding an additional parameter. For the CDM 
power spectrum where the slope curves gently a natural pa- 
rameter to take would be the local slope of the power spec- 
trum. At the small spatial scales relevant to the halo mass 
function at high redshift, the spectral slope approaches a 
critical value n = — 3 which marks the boundary between 
bottom-up hierarchical structure formation and top-down 
structure formation. One might expect that the need for an 
extra parameter would become apparent as this boundary is 
approached. In § 4.2, we find that we can improve the good- 
ness of fit to our mass function by using this as an extra 
parameter, and it is the high redshift simulation data which 
require this. 

2.2 Finite simulations and the global mass 
function 

Due to limited computing resources any cosmological N- 
body simulation can only model a finite volume of space. 
Periodic boundary conditions are usually implemented in 
order to avoid edge effects, with the most common geom- 
etry being a periodic cube. In this case, the overdensity of 
matter, <5(r), is given by a sum over Fourier modes: 

5 = 5k exp(ik.r), (6) 

k 

where the 5k are complex amplitudes which obey a reality 
condition <5£ = <$-k- Because the simulation volume must 
have mean density, <5k=o = 0. The Fourier modes have 
wavenumbers k = 27r/L bax (i, m, n), where l,m,n are inte- 
gers and Lbox is the side-length of the simulation volume. 

The initial conditions for a simulation of a CDM uni- 
verse with adiabatic density perturbations require that the 
initial density field should be a Gaussian random field. In 
this case, the phases of the wave amplitudes, <5k, are random, 
and the amplitude of each mode is drawn from a Rayleigh 
distribution (Efstathiou et al. 1985) where: 

< \5 k \ 2 >=P(fc)/L box (7) 

and the brackets <> denote an ensemble average over real- 
isations. 

For a periodic cosmological simulation, the smoothed 
rms linear overdensity, er, is given by the discrete analog of 
Eqn.[[] 

a 2 (M) = b 2 (z)Y,\Sk\ 2 W 2 (k;M), (8) 

k 

where |<5k| refers to the linear amplitude of the Fourier modes 
at z = 0, and b and W are the same as in Eqn. [T] 

A number of authors (Power & Knebe 2006; Sirko 2005; 
Bagla & Prasad 2006) have highlighted the problem that the 
halo mass function in a finite periodic box will differ between 



realisations and that the ensemble average mass function 
will not be the same as in the limit of an infinitely large 
simulation volume. The effects of having a discrete power 
spectrum with only a small number of modes with wave- 
lengths comparable to the size of the cube and no power 
with wavelengths larger than the cube are particularly im- 
portant for small cosmological volumes where the contribu- 
tion to the variance of the density field from each successive 
decade of wavenumber is a very weakly increasing function. 
Given the computing resources available to us and the re- 
quirement that we should resolve haloes with a hundred or 
more particles at high redshift, it is inevitable that the vol- 
umes we wish to simulate will be affected significantly by 
finite box effects. 

Our approach to minimise finite box effects and to es- 
timate the high redshift halo mass function is to make the 
ansatz that the universal form of the halo mass function 
correctly describes the halo mass function even in volumes 
where finite volume effects are significant. For small volumes 
it is important to use the correct relation between M and er 
given by Eqn.[S]for each individual simulation in order to es- 
timate the halo mass function in /(er) — In cr^ 1 space. Having 
estimated /(<x) we can now predict the halo mass function, 
Eqn. [3] for the astrophysically interesting case of an infinite 
volume, using the relation between a and M given by Eqn.[T] 
We call this mass function the global mass function. 

It not obvious a priori just how successful this approach 
will be in recovering the global mass function. However, 
it is worth noting that this approach contains inherently 
the essential elements present in conditional mass function 
methodology wherein the number of haloes within a local 
patch is estimated (Mo & White 1996, Bower 1991, Bond 
et al. 1991, Lacey & Cole 1993). If we take our simula- 
tion volume to be a patch of universe with mass Af pa tch and 
mean density, then the variance that we measure within the 
simulation volume is close to c| lobal (M) - o-| lobal (M patch ), 
where o~ 2 lobal (M) ls given by Eqn. [1] The mass function 
in such a finite patch is reliably predicted by substituting 
o&obA M ) - CT giobai (Mp a tch) for cr| lobal (M) into the func- 
tional form of the mass function (Sheth & Tormen 2002). 
Our methodology is an improvement on such an approach 
because we also include the effects of run to run "cosmic" 
variance. As will be discussed in § 4, we find that our method 
does work very well in practice. To demonstrate this one 
needs a large number of simulations with multiple random 
realisations at a fixed box size and a variety of differing box 
sizes. We describe our suite of simulations in the next Sec- 
tion. 

In practice, we determine a(M) for a particular simula- 
tion by measuring the power spectrum of the initial condi- 
tions. We perform a sum over the low-k modes, but switch 
to an integral over the linear power spectrum for wavenum- 
bers greater than l/20th of the particle Nyquist frequency 
of the simulation. At the changeover point the number of 
independent modes is sufficiently large that the difference 
between doing a summation or an integration is negligible. 

Our procedure for correcting for the finite simulation 
volume is more direct than and is simpler in practice than 
the Cole (1997) modification of the Tormen & Bertschinger 
(1996) mode adding procedure (MAP). In the MAP algo- 
rithm, an evolved simulation is replicated onto a linear den- 
sity field of a larger volume, adding displacements from the 
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long wavelengths to the replicated particle positions to ap- 
proximate the effects of large scale power, thereby increas- 
ing the effective volume of the simulation. Adding a large 
scale density perturbation has the effect of changing the lo- 
cal value of fi m , which also changes the linear growth factor, 
and in effect, changes the redshift. This means that in order 
to produce the equivalent of a large volume simulation snap- 
shot, replicated particles from the small volume simulation 
must be temporally synchronized according to what large 
scale density they are being "mapped" onto (Cole 1997). 
Although the MAP approach is promising, it has not been 
thoroughly tested in the regime in which we are interested, 
and it does not take into account directly the coupling be- 
tween small and large scales, which has the potential to af- 
fect halo formation. 



3 THE SIMULATIONS 

We use the parallel gravity solver L-GADGET2 (Springel et 
al. 2005) to follow the evolution of dark matter in a num- 
ber of realisations of different cosmological volumes. Table [1] 
lists all our simulations and the numerical parameters used. 
The highest resolution simulations have particle mass of 10 3 
/i _1 M0 and resolve haloes to redshifts as high as 30. Our 
new simulation volumes range from 1 to 100 /i _1 Mpc on 
a side. For these runs, the cell size of the mesh used by 
L-GADGET2 in the PM portion of the Tree-PM force algo- 
rithm to compute long range gravitational forces is equal to 
one half the mean particle spacing. We also include results 
of the 500 /i _1 Mpc "Millennium run" (Springel et al. 2005). 
For low redshift haloes, we include analysis of the 1340 
h~ Mpc run by Angulo et al. (2006, in preparation), and the 
3 ft _1 Gpc "Hubble volume", which has fl m = 0.3, Q,a = 0.7, 
and as = 0.9, was run using HYDRA (Couchman, Thomas 
& Pearce 1995; Pearce & Couchman 1997), and uses the 
Bond & Efstathiou (1984) transfer function (see Colberg et 
al. 2000 and Jenkins et al. 2001 for details). We have verified 
the robustness of our results to the choice of run param- 
eters by varying individually the starting redshift (z sta rt), 
the fractional force accuracy (Af orcc acc = 0.005), the soft- 
ening length (r so f t ), and the maximum allowed timestep 
(A t = Aln(l + z)" 1 ); these tests are detailed in the Ap- 
pendix. 

Initial conditions for runs with box length of 50 /i -1 Mpc 
or smaller were created using the CMBFAST transfer func- 
tion (Seljak & Zaldarriaga 1996) as follows. Traditionally, 
the initial conditions are generated from a transfer func- 
tion calculated at z — and extrapolated to the initial 
redshift using linear theory. However, in order to avoid a 
high wavenumber (k) feature^ in the CMBFAST z = 
transfer function, our adopted transfer function consists of 
the z = transfer function for small k spliced together 
at k = 1 /iMpc" 1 with a high redshift (z = 599) transfer 

1 We noticed a feature in the CMBFAST transfer function at 
k ~ 10 2 ' 5 MVfpc - 1 , where the slope steepens for approximately a 
decade in k, resulting in a power spectrum that briefly becomes 
steeper than the theoretical asymptotic minimum slope of n = — 3 
at the smallest scales for a primordial spectral index n s = 1. 
This unexpected feature is not present in high redshift (z 2> 100) 
computations of the CMBFAST transfer function. 



Table 1. Simulation parameters. N runs random realisations of 
cubical volumes of side Lb ox were simulated, from redshift Zstart 
to z nn . Npart particles of mass M par t and gravitational force soft- 
ening length r so f t were employed. 



Nruns 


Lbox 


Mpart 




Npart 


z fin 


Zstart 


r ft 




/i -1 Mpc 


/i _1 M 










h~ 1 kpc 


11 


1.0 


1.1 X 


10 3 


400 3 


10 


299 


0.125 


1 


2.5 


1.1 X 


10 3 


1000 3 


10 


299 


0.125 


3 


2.5 


1.1 X 


10 3 


1000 3 


30 


299 


0.125 


1 


2.5 


8.7 x 


10 3 


500 3 


10 


299 


0.25 


1 


2.5 


1.4 x 


10 6 


200 3 


10 


299 


0.625 


1 


4.64 


1.1 X 


10 5 


400 3 


10 


249 


0.58 


2 


11.6 


1.1 X 


10 5 


1000 3 


10 


249 


0.58 


1 


20 


8.7 x 


10 6 


400 3 


10 


249 


2.5 


2 


50 


8.7 x 


10 6 


1000 3 


10 


299 


2.4 


1 


100 


9.5 x 


10 7 


900 3 


10 


149 


2.4 


1 


500 1 " 


8.6 x 


10 s 


2160 3 





127 


5.0 


1 


1340tt 


5.5 x 


10 10 


1448 3 





63 


20 


1 


3000"+ 


2.2 x 


10 12 


1000 3 





35 


100 



t "Millennium" run (Springel et al. 2005) 
tt Angulo et al. (2006) 

t+t "Hubble Volume" (Colberg et al. 2000; Jenkins et al. 2001) 



function for large k. Power is matched on either side of the 
splice. The location of the splice is chosen to be at a point 
where the shape of the transfer function has essentially no 
redshift dependence, thereby ensuring continuity of spec- 
tral slope. We use a combined mass-weighted dark matter 
plus baryon transfer function. The 100 /i Mpc run and the 
1340 ft _1 Mpc runs both used the transfer function used for 
the Millennium simulation, which is detailed in Springel et 
al. (2005). 

Our choice of initial conditions and simulation tech- 
niques neglects direct treatment of baryons. The method 
that we implement is ideal for our purposes of modelling the 
dark matter halo mass function and assessing its universality 
given an input dark matter fluctuation spectrum. However, 
for the purpose of making highly accurate predictions of the 
numbers of haloes in the real universe, coupling of baryons 
to photons, and subsequently to dark matter can be impor- 
tant at the high redshifts that are involved. For example, at 
the starting redshift, the baryons are much more smoothly 
distributed than the dark matter. The ensuing evolution of 
the dark matter distribution is then affected as the baryon 
fluctuations begin to catch up to the dark matter. We refer 
the reader to further discussions regarding these and related 
issues by e.g. Yamamota, Noashi & Sato (1998), Yoshida, 
Sugiyama & Hernquist (2003), Naoz & Barkana (2005) and 
Naoz, Noter & Barkana (2006). 



4 RESULTS 

4.1 The mass function 

In Fig. [T] we show the simulation mass functions at redshifts 
ten, twenty, and thirty. The left panel shows the measured 
raw abundance of haloes within the (finite) simulation vol- 
umes. In the right panel, the global mass function is plotted 
using the transformation explained in section 2.2. Opera- 
tionally this is done as follows. The group finder returns a 
group catalogue for a simulation consisting of a list of the 
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number of haloes of each mass. Suppose in the catalogue 
there are J haloes with an average mass M. Eqn. [8] is used 
to find the value of a that corresponds with mass M for this 
particular simulation. Applying Eqn. [1] we can determine a 
mass M' which, for an infinite volume, has this same value 
of a. We can effectively 'correct' the catalogue to yield a 
new catalogue for the same volume of space as the original 
simulation but sampled from an infinitely large simulation 
volume. To do this, each of the masses M in the catalogue is 
replaced with the corresponding mass M' , and the number 
of haloes J is replaced by a value J', such that for mass to 
be conserved in the transformation JM = J'M' . Note that 
while J is an integer, J' will not, in general, also be one. 
The corrected catalogue can then be used to construct an 
estimate of the global differential halo mass function. 

Because of the missing power in smaller volumes, the 
net effect of the transformation is an increase in the mass 
and a decrease in the abundance of a given bin in such a way 
that the resulting adjusted mass function is higher at a given 
mass. Note that, with the correction, the variation between 
simulations in Fig. \T\ is very much reduced and the agree- 
ment between different box sizes is much improved. Once 
this transformation has been made, the simulated mass func- 
tion lies nearer, but generally below the Sheth-Tormen func- 
tion for the most massive objects at redshift ten to thirty. 
The Press- Schechter mass function is a poor match to the 
simulation data, especially at high masses. 

It is instructive to plot the fraction of collapsed mass, 
/(<r), as a function of lncr -1 . This fraction is independent of 
redshift according to the principles underlying P-S or S-T 
models. In FigO we plot f(a) as a function of In <r -1 , includ- 
ing the correction for finite volume. The fact that the data 
over a wide range of redshifts all coincide approximately in 
a single form is an indication of the general redshift inde- 
pendence of the mass function. However, we discuss in § 3.2 
some evidence for a weak dependence on redshift. Haloes 
formed from rare fluctuations - high mass, high redshift, or 
both - lie at large values of 5 c /a and hence large \ncr~ 1 . 
Here the mass function is steepest. Note that rarer haloes 
do not necessarily have lower spatial abundance. This can 
be understood by comparing a high redshift low mass halo 
with a low redshift high mass halo, each forming from e.g. 
a 5-<t fluctuation (J8 c /a(M,z)] — 5). In the case of the low 
mass, high redshift halo, the number of regions per comov- 
ing volume element that contain the halo's mass is larger, 
which results in a higher comoving halo abundance. 

An important quantity is the cumulative fraction of 
mass contained in haloes. In Fig. [3] we plot the ratio of 
/(> M) divided by the S-T function. In the right panel, the 
global mass function has been corrected using the actual 
power from each simulation in the relation between halo 
mass and variance (Eqn. [SJ, as in Fig. [T] which automati- 
cally accounts for finite volume effects. The greatly reduced 
run-to-run scatter compared to the raw, uncorrected mass 
function highlights the improvement gained by using a more 
accurate relation between halo mass and variance. The cor- 
rection for limited simulation volume is evidenced by the 
systematic upward shift in the cumulative mass fraction, 
which is strongest for small boxes, and for high mass and/or 
redshift "rare" haloes. 




In a~ l 

Figure 2. The fraction, f(o"), of collapsed mass per unit lnc -1 , 
where a 2 is the variance, at z = 10, 20, and 30. The mass fraction 
has been adjusted to account for finite volume effects as described 
in § 2.2. Approximate redshift invariance is indicated by the fact 
that all redshifts have roughly the same mass function. 

4.2 Fitting the mass function 

An analytic form for the mass function is an essential ingre- 
dient for a wide array of models of galaxy formation, reion- 
ization, and other phenomena, and is also required for cos- 
mological studies based on observable objects whose num- 
ber density depends on the halo mass function. In Fig. [4] we 
show our data along with several analytic functional fits; see 
also Fig.JIJi-d, where we plot the mass function split by red- 
shift. The error bars in Figs. 14161 are obtained by computing 
the square root of the number of haloes in each mass bin of 
a given simulation (see Appendix B for further discussion of 
uncertainty estimates) . 

The S-T function provides a reasonable fit except for 
rare haloes (large 8 c /a(m, z)), where the simulations pro- 
duce ~50% fewer objects. The P-S function is a poor fit 
at all redshifts. Of the previously published fits, Reed et 
al. (2003) is the most consistent with our combined high 
and low redshift data, fitting the data with an rms differ- 
ence of 11%, i.e. x 2 = 1 if 

we artificially set the uncertainties 
to be equal to the Poisson errors plus 11% of the measured 
abundance, added in quadrature. The Jenkins et al. function 
is an excellent fit to our low redshift simulation data, but it 
matches the high redshift data less well. This comparison, 
however, requires extrapolating the function beyond its in- 
tended range of validity,namely the original fitted range of 
-1.2 < lno- -1 < 1.05.0 The Warren et al. (2006) curve, 

2 Due to differences in binning the data in the regime where the 
mass function is steep, we find the z = 10 mass function in the 
Millennium run (the six rightmost z = 10 points in Fig. [4} to 
be ~ 10 — 20% lower than in Springel et al. (2005), who found 
somewhat better agreement with the Jenkins et al. fit. 
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Log 10 M [h-'M s ] log 10 M [h-'Mj 

(a) Raw simulation mass function (b) Global mass function 

Figure 1. Differential simulated mass function of friends- of- friends dark matter haloes for redshift 10, 20, and 30 compared with the 
Sheth & Tormen and Press & Schechter analytic predictions. The corrected mass function (right panel) makes a correction for cosmic 
variance and for finite volume to the simulation mass fluctuation spectrum by using the relation between a 2 and mass derived from the 
power spectrum of the initial particle distribution for each realisation; i.e. the left panel uses cr^ (m) (Eqn. [T] the variance versus mass 
for an infinite universe) and the right panel utilises a 2 , (m) (Eqn. [8] the actual variance for each realisation). Note the reduced run to 
run scatter and increased amplitude of the corrected mass functions for small boxes. For comparison purposes, the 100 and 500 h~ 1 Mpc 
runs have been rescaled by the ratio of their expected S-T mass functions to account for their mildly different transfer functions. The 
bin width here and throughout the paper is dlogigM = 0.125. 



which is very similar to the Jenkins et al. form over its orig- 
inal fitted range, fits our lowest redshift data quite well, but 
it is not as good a fit to our high redshift results. 

We now consider whether our data support an improved 
fit compared to published analytic mass functions. We define 
the effective slope, n e s, as the spectral slope at the scale 
of the halo, where P(kh) oc fc^ cff and kh — 2n/ro, with 
ro the radius that would contain the mass of the halo at 
the mean cosmic density. If we limit the fit to a redshift 
independent form, with the assumption of no dependence on 
n c fr, our simulation data can be fit by steepening the high 
mass slope of the S-T function (Eq. [5} with the addition of 
a new parameter, c = 1.08, in the exponential term, and 
simultaneously including a Gaussian in lncr -1 centred at 
lncr -1 = 0.4, as described by the following function, which 
is otherwise identical to the S-T fit: 



1 + 



Gi = exp 



r ) P + 0.2Gi 



2(0. 6)' 2 







IT exp 


~ la' 2 



(9) 



For analytical modelling purposes, it is useful to recast this 
equation in terms of a new variable lo 2 = ca8 2 /a 2 



/O) = A'lo 



1 + \muj 2p +0.2Gi 



exp 



(10) 



Gi = exp 



[ln(u 



) -0.788]-* 
1 



2(0 



where A! = 0.310 and ca = 0.764. The resulting function is 
comparable to the Reed et al. (2003) fit, and is generally con- 
sistent with the Sheth & Tormen 2002 modification to the 
S-T function with a=0.75 instead of a=0.707 (not plotted). 
Note that this modification means that the original normal- 
isation criteria - that all mass be contained in haloes, (Eqn. 
[2j — is not satisfied exactly; instead, 98% of mass is contained 
in haloes. It is remarkable that our data at all redshifts over 
a vast range in masses are generally consistent with a sin- 
gle functional fit that is solely a function of the variance, 
independent of redshift. However, while this redshift inde- 
pendent function appears reasonable at high redshift, it is 
relatively poor at z — 0. 

Careful inspection reveals tentative evidence for a de- 
pendence on some additional free parameter(s). The mass 
function at z ^ 10 is suppressed, at levels of £ 10 — 20%, 
relative to lower redshifts, indicating a weak dependence of 
the mass function on redshift. However, since a given value 
of a corresponds to different masses at different redshifts, it 
is unclear whether the apparent trend with redshift masks 
a dependence on mass or on some other parameter. Regard- 
less of the cause, inclusion of an additional parameter in 
the mass function provides a better fit to our data, as we 
now show. We consider the possibility that the mass func- 
tion may be affected by n c ff, the power spectral slope at the 



© 2003 RAS, MNRAS 000,riJfT5l 



8 Reed et al. 




(a) Raw simulation mass function 



100 




(b) Global mass function 



Figure 3. Cumulative fraction of mass contained in FOF haloes. Left panel assumes cr 00 (m) (Eqn.. [TJ, the variance for an infinite 
universe). Right panel utilises o- sim (m) (Eqn. [8}, the variance- mass relation derived from the mass power spectrum of the initial particle 
distribution. Accounting for missing large-scale power in this way results in the systematic upward shift in the corrected mass function. 
Reduced run-to-run scatter and improved agreement between different box sizes (different colours) indicates the effectiveness of the finite 
volume correction to the mass function. 



scale of the halo radius. An improved fit can be made at each 
redshift with the introduction of n e g in the analytic func- 
tion, as given by the following formula, again a modification 
to the S-T function: 



f(a,n eff )=A^/f 



1+ (^7) +O.6G1 +0.4G 2 



(11) 



x — exp 



Gi = exp 
G 2 = exp 



_ caSj _ 0.03 (S c \ 
2a' z (n c{f + 3Y 2 \ a ) 

(In o-^ 1 -0.4) 



2(0. 6) 2 



(ln<7~ 1 -0.75) 2 
2(0. 2)* 



where c = 1.08, and G\ and G2 are gaussian functions in 
lncr -1 . This can be rewritten for the purpose of more con- 
venient modelling as 



f(a,n cS ) = 



w _ 0.0325^ p 
2 (n off +3p 



[ln(L;)-0.788] 2 
2(0. 6)^ 



[ln(^)-1.138] J 
2(0. 2)' 1 



1 + 1.02w 2p + O.6G1 + 0.4G 2 (12) 
xA'uiy/^ exp 
Gi = exp 
G 2 = exp 

where A' = 0.310 and ca — 0.764. This function fits the data 



to 4% rms accuracy, significantly better than the 15% rms 
accuracy of the single parameter fit of Eqn. [TO] 

The new analytic mass function is presented for red- 
shifts zero through thirty in Fig. [S^-d. The broad "bump" 
over the S-T function in the 2 = mass function centred 
near lncr -1 = 0.4, which is also present in the Jenkins et 
al. and Warren et al. fits, is produced by the Gaussian func- 
tions in In a" 1 space. The n e g term introduces a redshift 
dependence that increasingly suppresses the mass function 
as n e s approaches -3, and becomes stronger for rarer haloes. 
From Eqn. [T] it is easy to show that for a pure power-law 
fluctuation power spectrum, a 2 a M~( n ° lt+3 ^ 3 , which can 
be reparameterized as 



"-off = 6 



. d In a 
d In M 



(13) 



At fixed redshift, n e ff is thus a proxy for halo mass. At the 
smallest scales, the CDM power spectrum asymptotes to 
n c g — —3 for a primordial spectral slope n s = 1 We have 
computed n a g using Eqn. [13] throughout this paper. How- 
ever, for convenience, since n c g is nearly linear with lna -1 
over relatively small ranges in M, n e fi can be approximated 
to better than 10% in (n e g + 3) by the following function 
within the mass and redshift range of haloes in this paper 
and for as = 0.9: 



"-off 



m z In a 1 + r z 



0.55-0.32 



(14) 



■1 + z' 
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Figure 4. Differential global (corrected for finite volume; see 
text) mass function of friends- of -friends dark matter haloes for 
redshift 0, 1,4, 10, 20, and 30 compared with analytic fits. Thick 
curves correspond to the modified S-T function (Eq. HOC . Error 
bars denote 1-a poisson uncertainties. 



-1.74-0.8 
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In Fig. [6] we plot the ratio of the simulation data to the 
new analytic fits, with (panel b) and without (panel a) the 
n e s dependence. The better fit obtained when n e fi is in- 
cluded suggests that the halo mass function is not redshift 
independent and thus cannot be described solely by the sin- 
gle parameter a(m,z). However, panel a) shows that any 
dependency on additional parameters is very weak. Nev- 
ertheless, the precise causes of this apparent dependency 
warrant further study. Interestingly, the form of n e g depen- 
dence modelled in peaks theory (e.g. Sheth 2001) predicts a 
smaller difference between the numbers of high redshift and 
low redshift haloes than we find in our simulations. 



WMAP and other CMB experiments combined with the 
2dFGRS, imply a fluctuation amplitude significantly smaller 
than is commonly assumed, and also suggest a spectral index 
smaller than 1. Both of these parameters have a significant 
impact on the number of small haloes at high redshift. 

Figs. [7] and [5] show that, compared to the cosmology 
assumed in the rest of this paper (as = 0.9, n s — 1.0, 
fl m = 0.25), the cosmological parameters inferred from the 
WMAP-3 data imply a factor of 5 decrease in the number of 
candidate galaxy hosts at z = 10 with mass ~ 10 8 /i _1 M , 
and more than four orders of magnitude decrease in the 
number of potential galaxies at z = 30 with mass ~2x 10 7 
h~ Mq . Smaller "mini-haloes" which could host population 
III stars are also strongly affected. The WMAP-3 cosmology 
implies less than half the number of "mini-haloes" of mass 
~ 10 6 /i _1 Mq at z = 10 and a reduction by more than three 
orders of magnitude at z = 30 relative to our standard cos- 
mology. Note that the comoving abundances in Fig.[7]do not 
match exactly the values in Fig. [1] because a slightly differ- 
ent transfer function was used for some of the simulations, 
as discussed in 82.1. 



The effect of the WMAP-3 cosmological parameters can 
also be interpreted either as reducing the mass of typical 
haloes at a given redshift, or as introducing a delay in the 
formation of structure. For example, haloes with number 
density 1 /i 3 Mpc~ 3 at z = 10 would have a mass approx- 
imately four times larger in our standard cosmology than 
in the WMAP-3 cosmology; for the same fixed number den- 
sity, this becomes a factor of ~10 at z — 20 and a factor 
of ~25 at z — 30. For haloes of a fixed mass, the reduced 
as and n s of the WMAP-3 cosmology delay halo forma- 
tion. For example, haloes of mass of lO 8 /i -1 M0 are delayed 
by A z ~ 6 (z—21 to z=15) before reaching a comoving 
abundance of 10" 2 fe 3 Mpc~ 3 . Haloes of 1O 6 /i _1 M , approx- 
imately the mass where H2-cooling becomes strong enough 
to trigger star formation, do not reach an abundance of 
1 ft 3 Mpc~ 3 until z = 21 for the WMAP-3 cosmology com- 
pared to z — 30 for our standard cosmology. This means 
that widespread population III star formation and galaxy 
formation would occur significantly later if the WMAP-3 
cosmological parameters were correct. 



5 SENSITIVITY TO COSMOLOGICAL 
PARAMETERS 

Our general results are unaffected by the exact values of the 
cosmological parameters because the fit of the mass func- 
tion in the simulations to an analytic form is, in principle, 
independent of the precise relation between variance and 
mass (although a dependence on n e fi introduces a weak de- 
pendence on cosmological parameters through the relation 
between n e fi and a(m, z)). Studies involving a wide range 
of cosmological parameters (e.g. Jenkins et al. 2001; White 
2002) have ruled out a strong dependence of f(cr) on matter 
or energy density. This allows one to estimate the halo abun- 
dance for a range of plausible cosmological parameters using 
purely the analytic mass function determined by a(m, z). In 
particular, the third year WMAP results (WMAP-3), which 
confirm the analysis by Sanchez et al. (2006) of the first year 



The uncertainties that remain in the values of cosmo- 
logical parameters translate into significant uncertainties in 
the number of high redshift haloes that can potentially host 
luminous objects, especially those haloes that host the first 
generations of stars and galaxies. This adds major uncer- 
tainty to predictions of the abundance of potentially de- 
tectable haloes in the pre-reionized universe, such as those 
modelled in e.g. Reed et al. (2005). However, the sensitivity 
of halo number density to cosmological parameters suggests 
the exciting prospect of using the number of small haloes 
at high redshift as a cosmological probe if future studies 
are able to establish their number density accurately. Such 
measurements of high redshift halo numbers would require 
not only extremely sensitive observations, but also further 
theoretical work to better understand the relation between 
observable properties and halo mass. 
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Figure 5. Differential global (corrected for finite volume; see text) mass function of friends- of- friends dark matter haloes for redshift 10, 
20, and 30 compared with analytic fits, including our new two-parameter fit (Eqn. |1H , which includes a dependence on spectral slope, 
n e ff , and the resulting redshift dependence. 



6 DISCUSSION 

Our simulations give the mass function of dark matter haloes 
out to redshift 30 and down to masses that include the small- 
est haloes likely to form stars, "mini-haloes" whose baryons 
collapse through Hb cooling. Thus, we now have a precise 
estimate of the mass function of the haloes that contain all 
the stellar material at observable redshifts, and at virtually 
all the redshifts that are potentially observable in the fore- 



seeable future. Our fits were obtained using simulations of 
the ACDM cosmology for a specific set of cosmological pa- 
rameters, but they are readily scalable to other values. 

These haloes may be detected in a numbers of ways. 
Large haloes (those with T v i r > 10 4 K), which have the po- 
tential to host galaxies formed by efficient baryon cooling, 
might be observed out to the epoch of reionization with the 
JW Space Telescope, or perhaps even with current gener- 
ation infrared observatories. Smaller haloes may not be di- 
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Figure 6. Ratio between the differential global (corrected for finite volume; see text) mass function of friends- of- friends dark matter 
haloes at redshift 0, 1, 4, 10, 20 and 30, and our single parameter fit (left; Eqn, 11 b : ratio between the same simulation data and a new fit 
that includes n e ff as a second parameter (right; Eqn. lilt . The improved fit in the right panel shows that our results are better described 
by a two-parameter function, rather than simply a function of lncr -1 . This is evidence for a weak redshift dependence (via n ef j ) of the 
mass function. 



rectly observable. However, their stellar end-products might, 
as gamma-ray bursts (e.g. Gou et al. 2004), or as supernovae 
at redshifts well above the epoch of reionization (e.g. Wein- 
mann & Lilly 2005). 

Knowledge of the halo mass function will be important 
in the interpretation of data from the Low Frequency Ar- 
ray (LOFAR), the Square Kilometer Array (SKA) or other 
rest-frame 21cm experiments designed to discover and probe 
the epoch of reionization. It may even be possible to use the 
halo mass function to help break degeneracies between cos- 
mological parameters and the astrophysical modelling of the 
first objects. 

Our results for the halo mass function at high red- 
shift imply that predictions for the abundance of dark mat- 
ter haloes that may host galaxies, stars, gamma-ray bursts 
or other phenomena based on the commonly used Press- 
Schechter model grossly underestimate the number of 3 a or 
rarer haloes. This includes large galaxies at z ~ 10, all haloes 
large enough to host galaxies at z ^ 15, and all haloes capa- 
ble of hosting stars at z ^ 20, assuming that a halo must be 
at least ~ 10 s h~ 1 M Q to host a galaxy and 1O 6 /i _1 M to 
form stars. The P-S function underestimates the true mass 
function by a factor ~ 5 for the rarest haloes that we have 
simulated, which applies to galaxy candidates at z — 30, 
and large (~ 10 11 /i _1 Mq) galaxies at z = 10. The abun- 
dance of mini-haloes likely to host Population III stars are 
underestimated by the P-S function by a factor of at least 
two at z = 30. Studies that assume the Sheth & Tormen 
function are more robust, but they still suffer from an over- 
estimation of the numbers of large haloes at high redshift, 
particularly of large galaxies at z ^ 10, extending to all po- 



tential galaxies at z £ 20, and to all star-forming haloes at 
z ^ 30, reaching a factor of up to ~3 for the rarest haloes 
in our simulations. However, the S-T function overpredicts 
the number of mini-haloes only by less than ~ 20% at ,$ 20 
and by ~ 40% at z = 30. 



7 SUMMARY 

We have determined the mass function of haloes capable 
of sustaining star formation from redshift 10 to 30, a period 
beginning well before reionization and extending to redshifts 
below those where reionization occurred according to the 
WMAP 3-year estimates. This extends the mass function to 
lower masses and higher redshifts than previous work, and 
includes the "mini-haloes" that probably hosted population 
III stars. Our main results may be summarised as follows: 

• We have presented a novel method for correcting for 
the effects of cosmic variance and unrepresented large-scale 
power in finite simulation volumes. This allows one to infer 
more accurately the true global mass function, ultimately al- 
lowing the mass function to be computed to smaller masses. 
We have verified the robustness of this method by carry- 
ing out simulations of a wide range of volumes and mass 
resolutions and comparing the inferred mass functions for 
overlapping halo mass ranges. By simulating multiple real- 
isations of identical volumes, we show that the run-to-run 
scatter in the mass function, caused by cosmic variance, is 
minimised by our method. 

• Throughout the period 10 < z < 30, the halo mass 
function is broadly consistent with the Sheth & Tormen 
(1999) model for haloes 3a and below. For rarer haloes, the 
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Figure 7. Differential analytic mass function for the WMAP 
3-year cosmological parameters (Spergel et al. 2006), compared 
with the mass function for the standard cosmological param- 
eters assumed in the simulations of this work. The curves 
marked "WMAP 3yr" use the preferred WMAP 3-year param- 
eters (e.g. <Tg,n s = 0.74,0.951), including the effect of the pre- 
ferred parameters(e.g. Q m , ^baryom e tc) on the transfer function. 
The abundance of massive, high redshift haloes is highly sensi- 
tive to cosmological parameters due to the steepness of the mass 
function. 



mass function drops increasingly below the S-T function - 
by up to ~ 50% for ~ 5a haloes. 

• Our data are reasonably well fit by a redshift- 
independent function of a(m,z), the rms linear variance in 
top hat spheres. We provide a 1-parameter fit to the mass 
function which is a modified version of the Sheth & Tormen 
(1999) formula. However, an even better fit can obtained if 
the mass function is allowed to depend not only on a(m, z), 
but also on the slope of the primordial mass power spec- 
trum, n e g. This improvement implies that that the fraction 
of collapsed mass does not depend solely on the rms linear 
overdensity, as is assumed in Press-Schechter theory. The 
P-S formula, in fact, provides a poor fit to must of our data. 

• The halo abundance at z ^ 10 is highly sensitive to 
o"8 and n„, parameters recently adjusted downwards in the 
reported WMAP 3-year results (Spergel et al. 2006). The 
new estimates imply greatly reduced numbers of high red- 
shift haloes of a given mass. The sensitivity of high-redshift 
halo numbers to these parameters suggests their potential 
as a useful cosmological probe in future. 

Code to generate our best-fit halo 
mass function may be downloaded from 




http: / /ice. dur.ac.uk/Research/PublicDownloads/genmf_readme 
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APPENDIX A: NUMERICAL TESTS 

Al Run parameter convergence tests 

In Fig. IA1I we have tested some of the primary runtime 
parameters of L-Gadget2 using a 10 3 h~ 1 M ( v ) particle res- 
olution 1 /i -1 Mpc volume. These tests confirm that our 
choices of starting redshift (z sta rt), fractional force accuracy 
(Af orco acc = 0.005), softening length (r so f t ), and maximum 
allowed timestep (At = Aln(l + z) _1 ) are sufficient. Of the 
parameters that we have tested, z sta rt has the most effect on 
our results. At z=10, z sta rt = 119 is indistinguishable from 
earlier starting redshifts. However, at z—20, the z star t = 119 
mass function is suppressed by ~ 10 — 20% relative to the 
Zst a rt = 299 runs, and by more than 50% at z=30. At z=30, 
the z s t ar t = 299 run is suppressed relative to the z star t = 599 
run by ~ 10 — 20%, which could indicate a small bias in our 
z=30 mass function, but not large enough to affect signif- 
icantly our conclusions. It thus appears that a simulation 
must be evolved a factor of ~10 in expansion factor in order 
to solve accurately (within 10-20%) the mass function for the 
mass resolutions and outputs we have considered, if one uses 
the Zel'dovich (1970) approximation to set up initial condi- 
tions, as we have. Note that the convergence of our runs 
with increased starting redshift implies that the Zel'dovich 
approximation is valid. See discussion in following section. 



A2 Resolving haloes 

Studies of the mass function at low redshift have found that 
the mass function is adequately sampled for haloes contain- 
ing as few as 20 particles (e.g. Jenkins et al. 2001). Because 
we are exploring new regimes in mass and redshift, it is 
necessary to confirm that we model haloes with sufficient 
particle numbers to resolve the mass function. The FOF 
mass function for haloes of very few particles is typically 
enhanced artificially as spurious groupings are increasingly 
common for small particle numbers. 

Our resolution tests consists of an identical volume, sim- 
ulated at multiple mass resolutions. The mass function of a 
2.5 h^Mpc box at resolutions of 200 3 , 500 3 , and 1000 3 is 



shown in Figure IA2l At each resolution, the mass function 
has an upturn below approximately 30-40 particles. Addi- 
tionally, the mass function is suppressed over the range of 
~30 to ~100 particles, at a level that appears to depend 
on redshift. At scales larger than approximately 100 parti- 
cles, the mass function at multiple resolutions agrees within 
the uncertainties. For this reason, we limit our analysis to 
haloes of at least 100 particles. This resolution limit is signif- 
icantly higher than found necessary at low redshift in many 
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Figure Al. Differential raw (unadjusted for finite volumes) simulated mass function of friends- of- friends dark matter haloes for redshifts 
10, 20, and 30 is shown for several run parameters. Unless otherwise noted, each run used the parameters implemented throughout this 
paper (zstart = 119, Af orce a cc = 0.005, r 8c .ft = lmean/20, At = 0.02) , except that the green (Af orco acc ), and magenta (r so f t ) curves used 
Zstart = 119- Here haloes are plotted down to 20 particles per halo versus the 100 particle minimum imposed throughout the paper; 
particle mass is 1.1 x 10 3 h- 1 M Q . 



previous works. Even with this conservative particle reso- 
lution limit, some small bias in the mass function cannot 
be ruled out fully for redshifts 20 or higher. The increased 
sensitivity of the mass function to particle resolution as red- 
shift increases is likely enhanced by the increased steepness 
of the mass function in this regime for haloes formed from 
rare fluctuations. The suppression of the mass function for 
haloes of fewer than 100 particles may be due to transient 
effects that become unimportant by lower redshifts. These 
effects could include errors introduced as a result of inac- 
curacies of the Zel'dovich approximation (Zel'dovich 1970) 
where initial particle positions and velocities are computed 
based on the initial density field, which is assumed to be en- 
tirely linear. If so, then these transients could be reduced by 
using second order Lagrangian perturbation theory (2LPT; 
Scoccimarro 1998, Crocce, Pueblas & Scoccimarro 2006) to 
set up initial conditions. Use of 2LPT may also reduce the 
required starting redshift. Further investigation is required 
to determine more fully the sources of increased sensitivity 
to mass resolution on the mass function. 

A3 Halo finders: friends-of-friends (FOF) versus 
spherical overdensity (SO) 

Choice of halo finder can have a major impact on halo masses 
and on the mass function. It is thus worth considering how 
our results would change had we used a different halo finder. 
In this case, the FOF mass function is compared with the 
mass function produced by the spherical overdensity SO al- 
gorithm (Lacey & Cole 1994), which identifies spheres of a 
specified overdensity. FOF is computationally efficient and 
will select objects of any shape provided that they meet a lo- 
cal particle density. However, FOF may also spuriously link 
together neighbouring haloes, which is a potential issue for 
low mass, high redshift haloes which form at scales where 
mass fluctuation spectrum is steep, and result in highly el- 
lipsoidal halo shapes (Gao et al. 2005). Because SO assumes 
haloes are spheres, it is not ideal for highly ellipsoidal haloes. 
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Figure A2. Residuals of the differential raw (unadjusted for fi- 
nite volumes) mass function for a 2.5 h~ iJVlpc box with identical 
initial density fluctuations modelled at 3 different mass resolu- 
tions (200 3 , 500 3 , and 1000 3 particles). Thick line segments de- 
note haloes of at least 100 particles, the minimum particle num- 
ber that we implement throughout the paper. Thin line segments 
show the mass function down to 20 particles per halo. 



However, SO has an advantage over FOF in that it is less 
likely spuriously to link together neighbouring haloes or to 
misclassify highly ellipsoidal but unvirialized structures as 
haloes. 

We have tested the SO algorithm assuming the spherical 
tophat model, in which the ACDM virial overdensity, A v i r , 
in units of the mean density is 178 at high redshifts, when 
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Figure A3, friends- of- friends (FOF, thin lines) and spherical 
overdensity (SO, thick lines) halo differential raw (unadjusted for 
finite volumes) mass functions for particle masses of ~ 10 3 , 10 5 , 
and 10 7 h~ 1 M Q (box size of 1.0, 4.6, and 20 /i _1 Mpc with 400 3 
particles). SO density is 178 times the mean density. Curves are 
plotted down to 20 particles per halo. 



Q m — 1. We have computed the SO mass function for three 
simulations of particle mass resolution ~ 10 3 , 10 5 , and 10 7 
/i _1 M at redshifts 10, 20, and 30. In Fi gure lA3l we show the 
SO and FOF mass functions for these outputs. In general, 
the SO mass function is lower than the FOF mass function, 
though the two mass functions are consistent for much of 
the redshift 10 mass range. The difference between the two 
mass functions increases with mass and redshift, ranging 
from 10% at redshift 10 and is generally less than a fac- 
tor of 2. Some caution should be taken when considering the 
SO mass function because of its particle number dependence 
in this implementation of the SO halo finder. Initial candi- 
date centres for SO haloes were identified by finding density 
peaks, where local density was computed for each particle, 
smoothed by its 32 nearest neighbours. Spheres were then 
grown outward until the desired overdensity was reached. 
This means that SO haloes with masses approaching and 
below the 32 particle smoothing mass will be suppressed. 

While there are some differences in the mass function 
for the two means of identifying haloes, these differences are 
generally smaller than the differences introduced by adopt- 
ing the WMAP 3 year cosmological parameters versus the 
larger as = 0.9 and n s = 1.0 used in the simulations of this 
paper. 



-So. 9 

"^0.8 
§ 

CO. 7 
0.6 

0.5 



0.4 - 



II 



Q 



[ 2.5 h~'Mpc run, a= Vn 
)j( random subvols, a= rmsJ 

(|)low <5 subvols, G— rms 



I 



o 



C) 



() 



±J_ 



10 5 



io 6 



10 7 

log I0 M [h-'M e 



10 8 



Figure Bl. Comparison of uncertainty estimators using the 
differential raw (unadjusted for finite volumes) mass functions 
for particle masses of z=10. Filled squares (black) show the 2.5 
h —1 Mpc run with poisson errors (where uncertainty is assumed to 
be equal to the square root of the number of haloes in each mass 
bin, as throughout the paper). Large X's (blue) denote the mean 
and rms variance of the mass function of 11 random 1 h~ 1 Mpc 
subvolume cubes selected from the same volume. Open circles 
(cyan) are the mean and rms of 3 low overdensity 1 /i _1 Mpc 
subvolmcs. 



of uncertainty is much smaller than the "bootstrap" uncer- 
tainty computed by taking the rms variance between random 
subsamples. The scatter between 3 non-overlapping subsam- 
ples with low overdensity, < 10 -3 ), is much smaller, and 
is in fact comparable to the scatter between individual sim- 
ulations of 1 fe _1 Mpc cubes. Although there are too few low 
overdensity subvolumes for a truly representative sample, 
their similarity suggests that most of the variance between 
the random subvolumes is due to non-zero mean density. 

Between the 11 small (1 h~ Mpc) simulations, the run 
to run variance is comparable to their individual average 
poisson uncertainty. This suggests that poisson uncertainty 
provides a reasonable estimate of the true uncertainty pro- 
vided that other finite volume effects, such as missing large 
scale power, are taken into account (see earlier discussion). 
The mean mass function for low mass haloes is consis- 
tent among these subvolumes and small volume simulations. 
However, for the low overdensity subvolumes, the mean mass 
function is deficient in massive haloes. 



APPENDIX B: UNCERTAINTIES AND RUN 
TO RUN SCATTER 

In this section, we discuss uncertainties and run to run vari- 
ance using the 2.5 /i _1 Mpc and 1 /i _1 Mpc boxes at z=10 as 
examples. Fig. IB UB2I shows that the poisson (y/n) estimate 
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Figure B2. Comparison of uncertainty estimators using the 
differential raw (unadjusted for finite volumes) mass functions 
for particle masses of z=10. Filled squares (black) show the 2.5 
h~ Mpc run with y/n poisson errors. Triangles (magenta) denote 
the mean mass function of the eleven 1 h~ 1 Mpc simulations with 
poisson uncertainty (magenta error bars) and run to run rms vari- 
ance (red error bars). 
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